How can I preprocess text in Python to achieve specific tokenization results

join shbcf.ru